431 research outputs found

    On Utilizing Association and Interaction Concepts for Enhancing Microaggregation in Secure Statistical Databases

    Get PDF
    This paper presents a possibly pioneering endeavor to tackle the microaggregation techniques (MATs) in secure statistical databases by resorting to the principles of associative neural networks (NNs). The prior art has improved the available solutions to the MAT by incorporating proximity information, and this approach is done by recursively reducing the size of the data set by excluding points that are farthest from the centroid and points that are closest to these farthest points. Thus, although the method is extremely effective, arguably, it uses only the proximity information while ignoring the mutual interaction between the records. In this paper, we argue that interrecord relationships can be quantified in terms of the following two entities: 1) their ldquoassociationrdquo and 2) their ldquointeraction.rdquo This case means that records that are not necessarily close to each other may still be ldquogrouped,rdquo because their mutual interaction, which is quantified by invoking transitive-closure-like operations on the latter entity, could be significant, as suggested by the theoretically sound principles of NNs. By repeatedly invoking the interrecord associations and interactions, the records are grouped into sizes of cardinality ldquok,rdquo where k is the security parameter in the algorithm. Our experimental results, which are done on artificial data and benchmark real-life data sets, demonstrate that the newly proposed method is superior to the state of the art not only based on the information loss (IL) perspective but also when it concerns a criterion that involves a combination of the IL and the disclosure risk (DR)

    Ultimate Order Statistics-Based Prototype Reduction Schemes

    Get PDF
    The objective of Prototype Reduction Schemes (PRSs) and Border Identification (BI) algorithms is to reduce the number of training vectors, while simultaneously attempting to guarantee that the classifier built on the reduced design set performs as well, or nearly as well, as the classifier built on the original design set. In this paper, we shall push the limit on the field of PRSs to see if we can obtain a classification accuracy comparable to the optimal, by condensing the information in the data set into a single training point. We, indeed, demonstrate that such PRSs exist and are attainable, and show that the design and implementation of such schemes work with the recently-introduced paradigm of Order Statistics (OS)-based classifiers. These classifiers, referred to as Classification by Moments of Order Statistics (CMOS) is essentially anti-Bayesian in its modus operandus. In this paper, we demonstrate the power and potential of CMOS to yield single-element PRSs which are either “selective” or “creative”, where in each case we resort to a non-parametric or a parametric paradigm respectively. We also report a single-feature single-element creative PRS. All of these solutions have been used to achieve classification for real-life data sets from the UCI Machine Learning Repository, where we have followed an approach that is similar to the Naïve-Bayes’ (NB) strategy although it is essentially of an anti-Naïve-Bayes’ paradigm. The amazing facet of this approach is that the training set can be reduced to a single pattern from each of the classes which is, in turn, determined by the CMOS features. It is even more fascinating to see that the scheme can be rendered operational by using the information in a single feature of such a single data point. In each of these cases, the accuracy of the proposed PRS-based approach is very close to the optimal Bayes’ bound and is almost comparable to that of the SVM

    Fault-Tolerant Routing in Mobile Ad Hoc Networks

    Get PDF

    On utilizing dependence-based information to enhance micro-aggregation for secure statistical databases

    Get PDF
    We consider the micro-aggregation problem which involves partitioning a set of individual records in a micro-data file into a number of mutually exclusive and exhaustive groups. This problem, which seeks for the best partition of the micro-data file, is known to be NP-hard, and has been tackled using many heuristic solutions. In this paper, we would like to demonstrate that in the process of developing micro-aggregation techniques (MATs), it is expedient to incorporate information about the dependence between the random variables in the micro-data file. This can be achieved by pre-processing the micro-data before invoking any MAT, in order to extract the useful dependence information from the joint probability distribution of the variables in the micro-data file, and then accomplishing the micro-aggregation on the "maximally independent" variables-thus confirming the conjecture [A conjecture, which was recently proposed by Domingo-Ferrer et al. (IEEE Trans Knowl Data Eng 14(1):189-201, 2002), was that the phenomenon of micro-aggregation can be enhanced by incorporating dependence-based information between the random variables of the micro-data file by working with (i.e., selecting) the maximally independent variables. Domingo-Ferrer et al. have proposed to select one variable from among the set of highly correlated variables inferred via the correlation matrix of the micro-data file. In this paper, we demonstrate that this process can be automated, and that it is advantageous to select the "most independent variables" by using methods distinct from those involving the correlation matrix.] of Domingo-Ferrer et al. Our results, on real life and artificial data sets, show that including such information will enhance the process of determining how many variables are to be used, and which of them should be used in the micro-aggregation process

    A new frontier in novelty detection: Pattern recognition of stochastically episodic events

    Get PDF
    A particularly challenging class of PR problems in which the, generally required, representative set of data drawn from the second class is unavailable, has recently received much consideration under the guise of One-Class (OC) classification. In this paper, we extend the frontiers of OC classification by the introduction of a new field of problems open for analysis. In particular, we note that this new realm deviates from the standard set of OC problems based on the following characteristics: The data contains a temporal nature, the instances of the classes are “interwoven”, and the labelling procedure is not merely impractical - it is almost, by definition, impossible, which results in a poorly defined training set. As a first attempt to tackle these problems, we present two specialized classification strategies denoted by Scenarios S 1 and S 2 respectively. In Scenarios S 1, the data is such that standard binary and one-class classifiers can be applied. Alternatively, in Scenarios S 2, the labelling challenge prevents the application of binary classifiers, and instead, dictates a novel application of OC classifiers. The validity of these scenarios has been demonstrated for the exemplary domain involving the Comprehensive Nuclear Test-Ban-Treaty (CTBT), for which our research endeavour has also developed a simulation model. As far as we know, our research in this field is of a pioneering sort, and the results presented here are novel

    A formal analysis of why heuristic functions work

    Get PDF
    AbstractMany optimization problems in computer science have been proven to be NP-hard, and it is unlikely that polynomial-time algorithms that solve these problems exist unless P=NP. Alternatively, they are solved using heuristics algorithms, which provide a sub-optimal solution that, hopefully, is arbitrarily close to the optimal. Such problems are found in a wide range of applications, including artificial intelligence, game theory, graph partitioning, database query optimization, etc. Consider a heuristic algorithm, A. Suppose that A could invoke one of two possible heuristic functions. The question of determining which heuristic function is superior, has typically demanded a yes/no answer—one which is often substantiated by empirical evidence. In this paper, by using Pattern Classification Techniques (PCT), we propose a formal, rigorous theoretical model that provides a stochastic answer to this problem. We prove that given a heuristic algorithm, A, that could utilize either of two heuristic functions H1 or H2 used to find the solution to a particular problem, if the accuracy of evaluating the cost of the optimal solution by using H1 is greater than the accuracy of evaluating the cost using H2, then H1 has a higher probability than H2 of leading to the optimal solution. This unproven conjecture has been the basis for designing numerous algorithms such as the A* algorithm, and its variants. Apart from formally proving the result, we also address the corresponding database query optimization problem that has been open for at least two decades. To validate our proofs, we report empirical results on database query optimization techniques involving a few well-known histogram estimation methods

    Modeling a domain in a tutorial-like system using learning automata

    Get PDF
    The aim of this paper is to present a novel approach to model a knowledge domain for teaching material in a Tutorial-like system. In this approach, the Tutorial-like system is capable of presenting teaching material within a Socratic model of teaching. The corresponding questions are of a multiple choice type, in which the complexity of the material increases in difficulty. This enables the Tutorial-like system to present the teaching material in different chapters, where each chapter represents a level of difficulty that is harder than the previous one. We attempt to achieve the entire learning process using the Learning Automata (LA) paradigm. In order for the Domain model to possess an increased difficulty for the teaching Environment, we propose to correspondingly reduce the range of the penalty probabilities of all actions by incorporating a scaling factor ÎĽ. We show that such a scaling renders it more difficult for the Student to infer the correct action within the LA paradigm. To the best of our knowledge, the concept of modeling teaching material with increasing difficulty using a LA paradigm is unique. The main results we have obtained are that increasing the difficulty of the teaching material can affect the learning of Normal and Below-Normal Students by resulting in an increased learning time, but it seems to have no effect on the learning behavior of Fast Students. The proposed representation has been tested for different benchmark Environments, and the results show that the difficulty of the Environments can be increased by decreasing the range of the penalty probabilities. For example, for some Environments, decreasing the range of the penalty probabilities by 50% results in increasing the difficulty of learning for Normal Students by more than 60%

    Multi-class pairwise linear dimensionality reduction using heteroscedastic schemes

    Get PDF
    Linear dimensionality reduction (LDR) techniques have been increasingly important in pattern recognition (PR) due to the fact that they permit a relatively simple mapping of the problem onto a lower-dimensional subspace, leading to simple and computationally efficient classification strategies. Although the field has been well developed for the two-class problem, the corresponding issues encountered when dealing with multiple classes are far from trivial. In this paper, we argue that, as opposed to the traditional LDR multi-class schemes, if we are dealing with multiple classes, it is not expedient to treat it as a multi-class problem per se. Rather, we shall show that it is better to treat it as an ensemble of Chernoff-based two-class reductions onto different subspaces, whence the overall solution is achieved by resorting to either Voting, Weighting, or to a Decision Tree strategy. The experimental results obtained on benchmark datasets demonstrate that the proposed methods are not only efficient, but that they also yield accuracies comparable to that obtained by the optimal Bayes classifier
    • …
    corecore